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Abstract 

Folate-sensitive fragile sites (FSFS) are a rare cytogenetically visible subset of dynamic mutations. Of the eight molecularly 
characterized FSFS, four are associated with intellectual disability (ID). Cytogenetic expression results from CGG tri- 
nucleotide-repeat expansion mutation associated with local CpG hypermethylation and transcriptional silencing. The best 
studied is the FRAXA site in the FMRl gene, where large expansions cause fragile X syndrome, the most common inherited 
ID syndrome. Here we studied three families with FRA2A expression at 2qll associated with a wide spectrum of 
neurodevelopmental phenotypes. We identified a polymorphic CGG repeat in a conserved, brain-active alternative 
promoter of the AFF3 gene, an autosomal homolog of the X-linked AFF2/FMR2 gene: Expansion of the AFF2 CGG repeat 
causes FRAXE ID. We found that FRA2A-expressing individuals have mosaic expansions of the AFF3 CGG repeat in the range 
of several hundred repeat units. Moreover, bisulfite sequencing and pyrosequencing both suggest AFF3 promoter 
hypermethylation. cSNP-analysis demonstrates monoallelic expression of the AFF3 gene in FRA2A carriers thus predicting 
that FRA2A expression results in functional haploinsufficiency for AFF3 at least in a subset of tissues. By whole-mount in situ 
hybridization the mouse AFF 3 ortholog shows strong regional expression in the developing brain, somites and limb buds in 
9.5-1 2. 5dpc mouse embryos. Our data suggest that there may be an association between FRA2A and a delay in the 
acquisition of motor and language skills in the families studied here. However, additional cases are required to firmly 
establish a causal relationship. 
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Introduction 

Dynamic mutations are iieritable unstable expansions of short, 
genomic repeat sequences. Various pathogenic mechanisms have 
been associated with dynamic mutations [1,2] and at least 40 
neurological, neurodegenerative and neuromuscular disorders are 
known to be caused by these types of mutations [3,4]. Expansions 
of these unstable sequences may occur in promoters, coding 
regions, introns and 3 ' and 5 ' untranslated regions (UTR) of genes 
[5,6,7]. Known and putative disease mechanisms include aberrant 
splicing [8], loss or gain of function of the encoded protein [9,10], 
the expanded repeat itself [11] or its RNA transcript [12,13] and 
Repeat Associated Non-ATG translation (RAN translation) 
[14,15]. The size threshold at which a repeat becomes unstable 
and/ or pathogenic varies widely, from the expansion of only a few 
trinucleotide repeats in e.g. /lAY-associated infantile epUeptic 



encephalopathy (MIM 308350) to over a thousand repeats in e.g. 
DMPr-associated myotonic dystrophy (MIM 160900), FXN- 
associated Friedreich ataxia (MIM 229300) and i^Affii-associated 
fragile X syndrome (MIM 300624) [16,17,18]. 

Fragile sites represent a specific subset of dynamic mutations 
that are visible as gaps or breaks on metaphase chromosomes from 
cells cultured under specific conditions. Fragile sites are catego- 
rised by the nature of the inducing culture condition and the 
population frequency of the mutation [19]. FRAXA is a rare, 
folate sensitive fragile site (FSFS) associated with a trinucleotide 
repeat (CGG) expansion mutation in the 5' UTR of the FMRl 
gene resulting in fragile X syndrome, the most common inherited 
intellectual disability syndrome [20] . Twenty-six other FSFS have 
been reported cytogenetically but only eight of these have been 
molecularly characterized: FRAXA [20], FRAXE [21], FRAXF 
[22], FRA16A [23], FRAllB [24], FRAIOA [25], FRA12A [26] 
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Author Summary 

Some human genetic diseases are caused by dynamic 
mutations, or expansions of a short repeated sequence in 
the genome that can be unstably passed on from 
generation to generation. A subset of these dynamic 
mutations l<nown as fragile sites can be seen as a breal< or 
gap on the chromosome when cells are cultured under 
specific conditions. To date eight folate-sensitive fragile 
sites (FSFS) have been characterized, and all are due to 
CGG-repeat expansions within the 5' UTR or promoter 
region of the respective gene. When the repeat expands in 
size, it becomes hypermethylated and the adjacent gene 
or genes are transcriptionally silenced. For at least four of 
the eight known fragile sites this silencing of the 
associated gene(s) lead to intellectual disability syndromes 
such as fragile X. In this work we describe molecular 
characterization of an autosomal FSFS called FRA2A on 
chromosome 2. As the molecular cause of FRA2A, we 
identify an expansion of a CGG repeat which subsequently 
results in silencing of the neighbouring gene AFF3. This 
gene is one of the four autosomal paralogss of the AFF2/ 
FMR2 gene which, when mutated, is the cause of the 
FRAXE syndrome. We find that FRA2A expression is 
associated with highly variable developmental anomalies 
in the three FRA2A families studied. 

and FRAllA [27]. To date, all characterized FSFS are due to a 
CCG/CGG trinucleotide repeat expansion. The expanded repeat 
and any adjacent CpG island become hypermethylated and 
transcriptionally silenced at a locus-specific repeat size-threshold 
[28]. At least four of the eight characterized rare, folate sensitive 
fragile sites are associated with a neurodevelopmental disorder. 
The relevance of folate sensitive fragile sites to intellectual 
disability (ID) is strengthened by five independent population 
studies that have all shown that autosomal folate sensitive fragile 
sites are overrepresented in people with ID compared to control 
groups without ID, with a prevalence of 1.2% and 0.27% 
respectively [reviewed in 29]. It thus seems likely that as yet 



uncharacterized CGG/CCG repeat expansions may be associated 
with neurodevelopmental problems. 

An autosomal FSFS on chromosomal band 2qll.2-ql2 has 
been previously described [30,31,32]. We studied three families 
with FRA2A-expression and a wide spectrum of neurodevelop- 
mental and other anomalies. We identified expansion of an 
intronic CGG repeat leading to hypermethylation of at least one 
promoter of the AFF3 gene in all FRA2A carriers and we 
hypothesise that the associated transcriptional silencing o{ AFF3 in 
the brain may be responsible for some of the developmental 
features observed in FRA2A carriers. 

Results 

FRA2A is due to expansion of a polymorphic CGG repeat 
within AFF3 

Using the simple repeat track on UCSC genome browser 
(GRCh37/hgl9) we identified three candidate CGG repeats in the 
FRA2A containing region (2qll-12). One of these repeats is 
located within the LAF4/AFF3 gene (chr2: 10072 126 1-1 0072 1286; 
hgl9), an autosomal homolog of the FRAXE-associated FMR2/ 
AFF2 gene. In order to determine whether this CGG-repeat is 
expanded in FRA2A we used metaphase FISH analysis on a 
FRA2A-expressing individual (Figure 1; AIL 3) with the BAG clone 
RP11-549H5 (chr2: 100,588,792-100,759,365; hgl9) encompass- 
ing the repeat. The FISH signal spanned the FRA2A fragile site 
(Figure 2A). Consistent with this the FISH signals from probes 
RP11-436F6 (AC010736) and RP11-506F3 (AC074387) were 
centromeric and telomeric to FRA2A, respectively. To establish 
co-location of the CGG repeat with the fragile site, long-range 
PCR-generated probes L 1 OK (chr2 : 1 00 7 2 1 983- 1 007 3 32 3 3; hg 1 9) 
and L18K (chr2: 100700447-1 007 18834; hgl9) were targeted to 
the genomic regions on either side of the (CGG)„ repeat. These 
probes did indeed flank the fragile site, locating FRA2A to a 3. 1 kb 
interval within the AFF3 gene (Figure 2B). The second red (L18K) 
FISH signal observed at 2ql3 (Figure 2B) is the result of two copies 
of a 24 kb low copy repeat flanking the NPHPl gene (Figure 2C) 
(chr2: 110520380-110538822 and chr2:l 1 1347822-1 1 1366260; 



III 



Family A 

AI.1 AI.2 



4% 



Family B 

BI.1 BI.2 

I Q-r© 



AII.3I All.oi 

26% ^ 21-40% 



AIII.1 



'"i [n] [n] 

^ 40% 



Family C 

CI.I Cl.2^ 

E-T-© 



CII.1 



c 

^ 26% 



© 



31% 



0 



FRA2A observed 
cytogenetically 



FRA2A not observed/ 
examined cytogenetically 



/'^^ Normal adult, significant 
V^ y developmental delay as child 



Moderate Intellectual disability, 
anal fisture, mild asthma 



C Premature birth, mild neuro- 
developmental problems 



Figure 1. Description of FRA2A Families A-C. Females are represented by circles, males by squares. The percentage of cells showing FRA2A 
expression are indicated on the bottom right-hand side of the symbol. As discussed in the text the 4% FRA2A expression seen in individual AI.1 
represents a false positive. The case number used to indicate the FRA2A carriers in the text is given on the top left-hand side of the symbol. N within a 
symbol indicated individuals expression of the fragile site was not examined or observed. The associated phenotypes in individuals AII.3, AII.4, BII.1 
and Cli.1 are detailed in the boxed symbol key. The proband in each family is arrowed. 
doi:1 0.1 371/journal.pgen.1 004242.g001 
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Figure 2. FISH analysis of the FRA2A fragile site. A. The BAC-clone 549H5 (labelled green) spans the fragile site FRA2A. B. FISH analysis using 
the 10 kb L10K (labelled green; chr2: 100721983-100733233; hg19) and 18 kb L18K (labelled red; chr2: 100700447-100718834; hg19) PCR-generated 
probes, targeted to map either side of FRA2A. The additional telomeric FISH signal on the red channel (red arrow chr2: 1 10520380-1 10538822 and 
chr2:1 1 1347822-1 1 1366260; hg19) is the result of a 24 kb low copy repeat (24 kb LCR pink text) encompassing L18K. C. A schematic representation 
of the position of the LCR. 
doi:1 0.1 371 /journal.pgen.1 004242.g002 



hgl9). A copy of this sequence is also located adjacent to the 
CGG-repeat within the AFF3 gene in the region covered by L18K. 

PCR-based amplification and sequencing of the AFF3 CGG 
repeat in 200 control chromosomes revealed it to be highly 
polymorphic with a length ranging from 3 to 20 copies (Figure 3). 
The most frequent CGG allele contains eight repeats (as does the 
genomic reference sequence; chr2:100721261-100721286; hgl9). 
To exclude the possibility that apparently homozygous control 
individuals are in fact heterozygous for the detected allele in 
combination with a large expansion that is not detected by this 
protocol, we amplified these samples with gene specific repeat 
primed PGR (Asuragen Inc., Austin Texas, USA). This protocol 
enables us to detect expansions up to 300-500 repeats. However, 
no expanded repeats were detected in control chromosomes, and 
the genotype distribution agreed with Hardy-Weinberg equilibri- 
um (P>0.05). 

PGR amplification of the repeat in the FRA2A-expressing 
individuals from AI.l, AII.3, AII.4, AIII.l, BII.l and GII.l 
(Figure 1) showed a single GGG allele in the normal size range. An 
additional smaller fragment was detected in subject AII.4. 
Sequence analysis of the smaller PGR product showed a 134-bp 
deletion encompassing the entire GGG repeat as well as some 
flanking sequences (Figure SI). This deletion was not detected in 
800 control chromosomes. To visualize the repeat expansion in 
the FRA2A-positive individuals, we performed Southern blotting 
on Hindlll digested genomic DNA of all available members of the 



three families (AI.l, AII.3, AII.4, AIII.l, BI.2, BII.l, GI.l, CI.2 
and GIL 1) and control samples. A 4.4 kb fragment was observed 
in all cases and controls. In five affected FRA2A-positive 
individuals we detected additional large fragments or smears 
compatible with the presence of an expanded allele (Figure 4). 
Two FRA2A-negative parents of FRA2A-positive individuals (BI.2 
and GI.2) showed additional larger fragments indicative of repeat 
expansion. No evidence of an expanded fragment was observed in 
the control samples or in FRA2A-negative individual GI.l. 
Interestingly, individual AI.l who had been reported as showing 
a low level of FRA2A-positive cells showed no fragments 
suggestive of an expansion mutation. A Southern blot of the same 
samples after JVcoI digestion gave very similar results (data not 
shown). 

A gene specific repeat-primed PGR assay was used for accurate 
sizing of the repeat expansion mutations. The mothers in family B 
and G both showed a slightly expanded allele (±120 and 106 
repeat units, respectively) in addition to an allele in the normal size 
range ( 1 5 and 1 7 repeat units, respectively) while their offspring 
show one allele of 8 units compatible with paternal inheritance 
and one allele with a large expansion of more than 300 units 
(Figure S2). This strongly suggests that the expanded allele in both 
families was inherited from the mother (Table 1). In family A, the 
apparently FRA2A-positive individual AI. 1 showed no evidence of 
an expanded allele on either Southern blot analysis or on repeat 
primed PGR. This apparent discrepancy could be resolved 
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Figure 3. Allele frequencies of the FRA2A associated CGG repeat in a population of 100 control individuals. PGR amplification of the 
repeat and subsequent sequencing in 200 control chromosomes revealed that it is highly polymorphic with a length ranging from 3 to 20 copies. The 
most commonly found allele contains eight repeated units. 
doi:1 0.1 371/journal.pgen.1 004242.g003 



genetically using the microsatellite markers (D2S2209/AF- 
MA246XE9 and D2S2311/AFMB355ZG1, Figwe S3). The 
FRA2A-positive daughters of AI. 1, AII.3 and AII.4, were shown 
to have inherited a difTerent non-expanded allele (8 CGG units) 
from him, while they share a common allele with the mother 
(AI.2). Their FRA2A-positive grand-daughter, AIII. 1 also inher- 
ited this maternal allele. The expansion was therefore probably 
inherited from AI.2, with the 4% FRA2A expression in AI.l 
representing a false-positive cytogenetic result. Unfortunately 
DNA was not available from AI.2 to determine if she also carried 
an intermediately expanded allele. 

Characterization of the two major AFF3 transcriptional 
start sites (TSS) 

The RefSeq AFF3 gene model consists of 23 coding exons and 
two b' non-coding exons together spanning 558 kb genomic DNA 
[33]. Here, the 5' non-coding exons are named exons 1 and 2 with 
the first coding exon called exon 3. An AFF3-specific cDNA probe 
encompassing the final 5 exons was used for northern blot analysis. 
A major transcript of approximately 8 kb, corresponding to the 
predicted size was detected in several tissues, including brain, 
placenta and lung (data not shown). 

To determine the precise location of the AFF3 transcriptional 
start sites (TSSs) in relation to the expanded repeat we used Cap 
Analysis of Gene Expression (CAGE) data from the FANTOM4 
consortium. FANTOM4 produces sequence tags from the 5' end 
of mRNAs from many different tissue sources and species and 
maps these to the reference genome [34]. Mapped CAGE tags 
reveal the sites of transcription initiation at single nucleotide 
resolution and provides a semi-quantitative measure of steady state 



mRNA levels for those transcripts using a tags per million (TPM) 
metric. The TPM scores for three different human tissue groups 
(brain, immune and other tissues) are plotted in Figure 5B. AFF3 is 
transcribed in telomeric to centromeric orientation and the x axis 
of these graphs represent the hgl9 genomic coordinates. The 
location of the 25 annotated exons of the RefSeq model of the 
AFF3 gene (NM_00 1025 108) is represented above the graphs 
using the same genomic coordinates. To assess the transcriptional 
activity of AFF3 during early brain development we mapped 
transcriptome sequencing reads of mRNA (RNA-seq) from three 
different human fetal brain samples to the regions surrounding the 
TSS identified by CAGE (Figure 5C). 

CAGE tag sequencing demonstrates two robust TSSs. The most 
5' TSS corresponds to the 5' end of exon 1 at position 
chr2: 100759169 (GRCh37/hgl9). This TSS is highly expressed 
in immune tissues with a mean of 300 tags per million shown by 
the blue bar in the middle graph (Figure 5B). There is no obvious 
transcription of exon 1 on RNA-seq from human fetal brain 
(Figure 5C, left-hand side). A second robust TSS was identified in 
CAGE data from brain and other tissues which mapped within 
intron 2 as shown by the blue bars in the top and bottom graphs in 
Figure 5B. The highest levels are seen in the brain (mean of 60 tags 
per million). An expanded representation of this region is shown in 
the right-hand side of Figure 5C. This shows no evidence of 
transcription of exon 2 but strong expression in exon 3. This also 
shows evidence of an alternative exon 1 immediately 3' to the TSS 
(Figure 5C, right hand side, black arrow). The TSS lies 
immediately downstream of the CGG repeat (Figure 5D) suggest- 
ing this expansion prone repeat is located within the core of an 
alternate AFF3 promoter. 
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Figure 4. Southern blot analysis of the AFF3 CGG repeat in all available members of the three families and two unrelated control 
individuals (C). DNA restriction fragments obtained from blood samples of all available members of family A (lanes 1 -4), B (lanes 5-6) and C (lanes 
7-9) where blotted with a specific 32P-labeled probe (chr2:100088460-100089451; hgl9) after digestion with Hind\\\. Lanes 10 and 11 contain DNA 
restriction fragments from two unrelated control samples (represented as diamonds) after digestion with H'md\\\. In addition to a normal size allele 
fragment of 4.4 kb, individuals CII.1, CI.2, BII.1, BI.2, AII.4, AII.3 and AIII.1 show additional fragments of a larger size, indicating repeat expansion. These 
expanded fragments were not present in the controls and individuals CI.1 and AI.1. A 1 kb length marker is presented at the left of the figure. 
doi:10.1371/journal.pgen.1004242.g004 



The location and tissue specific activity of the AFF3 
promoters are similar in mouse AffS 

FANTOM4 CAGE data also encompasses a range of mouse 
tissues. From this we can demonstrate that both the exon 1 TSS 
and intron 2 TSS are evolutionarily conserved and functional 
TSSs. Although the CGG repeat itself is not conserved, a region of 
low compositional complexity flanked by highly constrained non- 
coding sequences is a conserved feature of the intron 2 TSS 
promoter (Figure 5). 

Whole mount in .situ hybridization (WISH) using riboprobes 
targeted to the 3'UTR of AJf3 was used to determine the 
developmental expression pattern in mouse embryos at 9.5, 10.5, 
1 1.5 and 12.5 days post coitus (dpc). This has shown site and stage 
specific expression of Aff3. The most striking areas of expression 
are in the somites, the upper limb buds, the diencephalon/ 
prosomere I and the fusing primary palate (Figure 6). 

AFFS promoter regions are hypermethylated in 
individuals with the FRA2A CGG-repeat expansion 

In all rare, folate-sensitive sites characterised to date, CGG 
repeat expansions are associated with hypermethylation of the 
surrounding CpG island. Bisulfite sequencing indicated hyper- 
methylation of the CpG island in all five affected FRA2A carriers 
AII.3, AII.4, AIII.1, BII.1 and CII.1, while in healthy control 
individuals this region was not methylated (figure S4). In order to 
quantify the methylation level, we subsequently subjected all 
samples to pyrosequencing. This technique allows accurate 
quantification of methylation across individual CpG sites [35]. A 
methylation frequency of 50% would be consistent with complete 



methylation of the expanded allele as all affected individuals in this 
study are heterozygous. We analyzed a short region of genomic 
DNA (chr2:100721843-100721885; hgl9) adjacent to the CGG 
repeat in all available family members, containing four analysable 
CpG dinucleotides (Figure SI and Table 2). Methylation 
percentages are congruous across the 4 CpG-sites in each 
individual (p-values ranging from 0.417 to 1.000) and are 
consistent with hypermethylation of the CGG repeat region in 
individuals carrying an expanded allele. There is some suggestion 
that the methylation frequency may be increasing upon genera- 
tional transmission. 

To exclude a non-specific effect of increasing age on the 
methylation of this region, we pyrosequenced 72 individuals from 
24 unrelated two-generation families. The ages within this control 
group varied between 0 and 1 1 years for the children and between 
23 and 53 years for the parents, which is comparable to the age 
distribution in our FRA2A families at the time of DNA collection. 
No methylation above the threshold was detected in any control 
individual (data not shown). In the FRA2A-carriers the methyl- 
ation level for each of the four CpG sites differed significantly from 
the level determined in this control population (p-values^O.OOl for 
CpG site 1,2 and 4 and p<0.004 for CpG site 3), suggesting that 
expansion of the repeat is associated with hypermethylation of the 
region surrounding the repeat. 

In AII.4, mosaic for a CGG-repeat expansion and a deletion, 
the promoter region on the expanded allele was hypermethylated 
as determined by bisulfite sequencing. The allele with the 1 34-bp 
deletion was not methylated, as determined by Southern blotting 
after double digestion with BamHl and Not! (data not shown). 
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Table 1. Sizing the repeat in all available family members with gene specific repeat primed PCR. 
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Family C 
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III.1 


1.2 


11.1 
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11.1 


allele 1 
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18 


15 


8 


5 


17 
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allele 2 
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>300 


>300 


>300 


±120 


>300 
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106 


>300 



Alleles were sized by gene specific repeat primed PCR. As expansions containing over 300 repeated units cannot be reliably sized using this technique, we indicated 
these alleles as ">300". Subject A. 11.4 is marked with an asterisk as she is mosaic for a 134-bp deletion taking away the entire CGG repeat in combination with a largely 
expanded allele, and this In addition to a normal sized allele. 
doi:1 0.1 371 /journal.pgen.1 004242.1001 



Monoallelic expression of the AFF3 gene in FRA2A 
carriers 

The level oi AFF3 expression in lymphoblastoid cell lines is too 
low to be reliably measured by RT-qPCR. To determine if 
FRA2A results in transcriptional silencing of AFF3 in cLs, we 
utilized single nucleotide polymorphisms (SNP) mapping within 
the open reading frame. Two such SNPs were found to be 
heterozygous in the genomic DNA of affected FRA2A carrier 
BII. 1. Rs4851214 maps to exon 14 and heterozygous individuals 
display both an T and a C peak (c.l499T/C) on Sanger 
sequencing, while rs 13427251 maps to exon 25 and heterozygotes 
for this SNP show both an A and a T peak (c.5475A/T). 
Sequencing cDNA from BII.l, revealed only a C peak at 
rs4851214 (Figure 7) and only the A allele peak was seen for 
rs 13427251. These results are consistent with monoallelic 
expression of AFF3 in this individual. Genomic DNA from 
BII.l's mother (BI.2) is homozygous for the rsl3427251 T allele 
(g.5475T/T) indicating that it is the maternal T allele carrying 
the CGG expansion that is silenced in BILL cDNA of BL2 
showed a heterozygous signal for rs4851214 (c.l499T/C) 
indicating that both AFF3 alleles are transcribed in the mother 
despite partial methylation of her expanded allele. 

Analysis of clinical phenotypes associated with FRA2A 

Six FRA2A carriers were initially included in this study 
(Figure 1, Table 3), four in Family A and one each in Families 
B and C. Individual AIII. 1 is currendy too young to make any 
conclusion about cognitive development. Individual Al.l has no 
discernible affected phenotype and he also has the lowest 
expression of the fragile site. The molecular analysis presented 
above strongly suggests that the 4% apparent expression of this 
case represents a false positive and so was excluded from the 
clinical analysis. Two FRA2A carriers AII.3 and AII.4 are adults; 
both had global delay in their neurocognitive development to a 
level that merited genetic investigations during childhood and 
their long-term placement in special educational facilities. 
However, as adults both of these individuals are functioning at a 
normal level and are in full time employment. This raises the 
possibility that they had a true delay in development rather than a 
frxed disability. Something similar is observed in FRAXE patients 
as most adult FRAXE males adapt to live a normal life. Individual 
CII.l was born prematurely and had significant respiratory 
distress, which confounds the unambiguous interpretation of the 
cause of her mild developmental delay. BII.l has the most 
significant delay, currendy without a plausible non-genetic 
explanation. Thus all four of the characterized true FRA2A 
carriers did have significant delay in their motor and language 
development. 

To determine whether the FRA2A carriers with neurodevelop- 
mental anomalies had additional mutations in the protein coding 



region of the AFF3 gene, mutation analysis of all coding exons was 
performed. No sequence abnormalities were detected in any of the 
affected FRA2A carriers, except in subject AII.4, in which a 6-bp 
deletion was identified in exon 14, removing two amino acids: 
Threonine and Alanine (position 619 and 621 respectively) 
(Figwe 5A). Both amino acids are found in a region, enriched 
with proline, serine and glutamic acid residues and located 
between the transactivation domain and the nuclear localisation 
signal (NLS). According to different prediction software (muta- 
tiontaster, mutation assessor, Indelz), the deletion is benign. 
Moreover, this 6-bp in-frame deletion was also present in the 
unaffected father (Al.l). 

Discussion 

We provide compelling evidence that the molecular basis of the 
FSFS FRA2A is a CGG repeat expansion in an alternative 
promoter which is active in the brain and is located in the intron 
immediately 5' to the first coding exon of the major AFF3 
transcript. The FRA2A-associated repeat is polymorphic in the 
general population. Repeat primed PCR showed all individuals 
with an expansion of over 300 repeat units expressed FRA2A in 
more than 20% of their cells. The expansion was associated with 
hypermethylation of a CpG island adjacent to the alternative 
promoter and, in at least one case, resulted in transcriptional 
silencing oi AFF3. These results are consistent with the epigenetic 
effects that have been described in other FSFS. Within each of the 
three families studied here higher levels of methylation correlate 
well with neurodevelopmental delay, higher repeat size and 
silencing of AFF3. However, there are striking disparities in the 
absolute levels of methylation observed between the families. For 
example individuals AII.3 and AII.4 both have >300 repeats and 
had evidence of neurodevelopmental delay during childhood but 
have lower levels of methylation than BI.2 (~ 120 repeats, biaUelic 
expression o{ AFF3) and C1.2 (106 repeats) neither of whom 
showed evidence of developmental delay. A likely explanation for 
this is that the assay used here was performed on peripheral blood- 
derived cells whereas the phenotype in which we are interested is 
developmental and neural. Many developmental loci appear to 
show tissue specific differences in DNA methylation [36]. In this 
regard the ability to model brain development using cerebral 
organoids from patient-derived pluripotent cells [37] may enable 
more interpretable transcriptional and epigenomic analyses of the 
consequences of CGG-repeat expansion on AFF3. 

Nonetheless, all individuals for whom a significant methylation 
frequency was measured, show an expanded allele in the pre- or 
fuU mutation range. Repeat sizes of >300 do correlate with 
neurodevelopmental effects and expression of the fragile site in a 
significant percentage of cells. Carriers of an expanded allele in the 
premutation range are phenotypically normal but may show lower 
levels of expression of the fragile site. 
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Figure 5. AFF3 protein-domain and gene structure. A. diagrammatic representation of the AFF3 protein, showing the N-terminal homology 
domain (NHD), C-terminal homology domain (CTHD) and two predicted nuclear localisation signals. The 1 8 amino acids encoded by the alternatively 
spliced exon 4 are represented as an "insertion" at position 18 of isoform 1. The 6-bp in-frame deletion we identified in exon 14 of subjects AI.1 and 
AII.4, removing two amino acids (position 619 and 621 respectively) is indicated in red. This deletion was predicted to be benign. B. Genomic 
structure oi AFF3 with the alternately used spliced exon 4 shown in red with asterisk. AFF3 exons 2, 3 and the alternate first exon are not separately 
resolvable at this resolution but shown in detail in panel C. Transcription left to right is shown in blue, right to left in brown. CAGE tag defined 
transcription start sites are shown aligned with annotated gene structure (blue forward strand transcription, brown reverse strand transcription 
shown with negative counts). Y-axis values show average tags per million (TPIVI) from CAGE libraries from the indicated tissue groups. C. Finer details 
of the AFF3 TSS regions are shown along with histograms of human fetal brain RNA-seq read coverage; three replicates are coloured separately. 
Splicing of the intron between the alternate first exon and exon 3 (blue asterisk) was supported by 9 independent RNA-seq reads and found in all 
three replicates. The CGG repeat and abnormally methylated region (AlVIR) are shown in red and green respectively. The major transcription start sites 
are shown as black arrowed lines. There is no supporting evidence for the RefSeq TSS represented by the pink arrowed line in our data. D. Alignment 
of human CGG repeat region and associated TSS with the orthologous mouse region. Nucleotides are color coded (A = green, G = yellow, C = blue, 
T = red, alignment gaps are grey). Orange histograms show the predicted G-quadruplex forming potential of the human and mouse sequences. Outer 
histograms show CAGE tag 5' ends at single nucleotide resolution in both human (top) and mouse (bottom). TPM counts shown are the average from 
brain derived CAGE libraries in each species and represent the precise location of transcription initiations. 
doi:1 0.1 371 /journal.pgen.1 004242.g005 



In one individual (AII.4) a mo.saic deletion of 134-bp removed 
the entire CGG repeat and the CpG island on the deleted allele 
remains unmethylated (data not shown). A similar combination of 
a full mutation with an expanded repeat and a deletion 
encompassing the repeat has been reported in several fragile X 
syndrome patients and recently also in a myotonic dystrophy type 
1 case [38,39]. In the fragile X syndrome, the phenotype of 



deletion cases is often indistinguishable from that of carriers of an 
expanded repeat, a reported exception being an unaffected 
individual where the deletion is the major allele present, and the 
transcription and translation start sites are preserved [40]. 

AFF3 belongs to a family of nuclear transcription activating 
factors including AFF1/AF4, AFF2/FMR2 and AFF4/AF5q31 
[33,41,42]. These proteins form super elongating complexes (SEC) 
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somites 



Figure 6. Site and stage specific expression of Aff3\n mouse embryos. A. pinotomicrograph of a lateral view of a 12.5 dpc embryo with the 
blue staining representing the gene expression of Aff3. The only area of strong expression at this stage is in the hand plate. B. show a false coloured 
but unthresholded image of the brightfield optical projection tomography (OPT) scan of a 11 .5 dpc embryo with digital sagittal and coronal sections 
from the same embryo shown in C and D. Strong expression is seen in the somites, upper limb bud, the fusing primary palate and the diencephalon 
and prosomerel regions of the developing brain. 
doi:10.1371/journal.pgen.1004242.g006 



with active P-TEFb (positive transcription elongation factor) and 
AF9/ENL. SECs regulate tlie induction and expression of 
different subsets of genes. AFF3 is tlie closest paralog of AFF2, 
and is 36% identical on the amino acid sequence level. They share 
functional domains including the N-terminal Homology Domain 
(NHD), the C-terminal Homology Domain (CTHD) involved in 



intranuclear localization and binding of G-quadruplex RNA 
structures, and the ALF domain that promotes protein degrada- 
tion through the proteasome pathway and the transactivation 
domain (TAD). Intriguingly, the highly conserved intron 2 TSS 
sequences and to a lesser extent the CGG repeat itself, are 
predicted to have a strong propensity to form G-quadruplex 
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Table 2. Pyrosequencing reliably quantified the methylation of four consecutive CpG dinucleotides per individual 
(chr2:1 00721 843-1 00721 885; hgl 9). 







CGCAA 


CGgagagcaggtc 


CGggtggaagaggtttcctccg 


CGc 


Family A 


1.1 


5±1,6 


4±1,3 


5±0,5 


2±0,5 


11.3 


18±2,0 


17±3,8 


20±3,4 


14±2,6 


11.4 


24±2,9 


21 ±2,9 


24 ±3,8 


18±2,5 


lll.l 


26±2,7 


22±1,5 


25 ±3,0 


21±1,1 


Family B 


1.2 


27±3,1 


26±3,1 


28±3,6 


22±3,1 


11.1 


45 ±0,4 


44 ±3,2 


48 ±2,4 


33±2,4 


Family C 


1.1 


5±1,7 


3±1,3 


3±1,7 


2 ±2,8 


1.2 


40±3,6 


40 ±4,4 


34±4,3 


35±3,6 


11.1 


54 ±4,5 


52±2,8 


60±3,5 


45 ±3,4 



A fifth CpG site is preceded by a 7 base pair T stretch {after bisulfite conversion), making it impossible to assess the exact percentage of cytosines compared to thymines 
for that particular site due to the limitations of the pyrosequencing technique. For each individual at least three separate analysis of different bisulfite conversions were 
performed. Average methylation percentages and confidence intervals are presented in this table. Methylation cut-off value was set at 10% according to the 
manufacturers guidelines. 
doi:1 0.1 371/journal.pgen.1 004242.1002 



Structures (Figure 5D, orange bars) with the most downstream of 
these being present in the 5' UTR of the produced transcript. 
Given that AFF3 is known to bind G-quadruplexes, there is the 
potential for AFF3 autoregulation at this promoter. Both AFF2 and 
AFF3 localize to nuclear speckles and modulate splicing efficiency 
[43]. The expression pattern of murine Aff3 overlaps to a 
considerable extent with that of murine AJf2 [43,44]. 

FRAXE is associated with loss of expression of AFF2 through 
dynamic repeat expansion of a COG repeat in the 5' UTR. 
FRAXE causes an X-linked non-syndromic intellectual disability 
[45]. AFF2 may play an important role in learning, memory, 
and language-learning processes [46]. Moreover, rare missense 



variations in the highly evolutionary conserved sites of the AFF2 
gene might be associated with autism spectrum disorder [47]. The 
AJ}2 knockout mouse model shows specific cognitive deficits, 
including an impaired conditioned fear memory over longer 
periods and enhanced long-term potentiation in the hippocampus 
[48,49]. AJf3 expression is upregulated in cortical neurons during 
the initial steps of cortical differentiation and is downregulated in 
postnatal cortex, indicating its involvement in brain development 
[44] . We have shown that Aff3 shows strong regional expression in 
the developing mouse brain. 

AFF3 is thus a reasonable candidate for the neurodevelopmental 
features seen in FRA2A carriers in our families. Our data predict 



B 



BI.2 gDNA 

CAGCGC YGTGGTCCAGCA 




BI.2 cDNA 

CAGCGC YGTGGTCCAGCA 



BII.1 gDNA 

CAGCGC YGTGGTCCAGCA 



BII.1 cDNA 

CAGCGCCGTGGTCCAGCA 

► 

A:^AaaAaa/V\/V\aAaaAa 

Figure 7. Analysis of rs4S51214 which maps to coding exon 14 of AFF3 using paired genomic DNA and cDNA templates from the 
unaffected carrier mother Bi.2 and the affected carrier son BII.1 from family B. A: Both BI.2 and BII.1 are heterozygous for SNP rs4851214 in 
genomic DNA as T and C peaks are visible at this site. B: Analysis of cDNA from subject BII.1, shows that only the C allele could be detected in this 
patient, indicating monoallelic expression, while in cDNA of subject BI.2 the heterozygous T/C signal was observed, indicating transcription of both 
alleles. 

doi:1 0.1 371 /journal.pgen.1 004242.g007 
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Table 3. Clinical findings in FRA2A carriers. 





Family 


A 


A 


A 


B 


C 


Individual 


11.3 


11.4 


III.1 


ll.l 


11.1 


Sex 


Female 


Female 


Female 


Male 


Female 


Age at last assessment (yrs) 


33 


31 


0,2 


12 


13 


Growth 


Normal 


Normal 


Normal 


Normal 


Normal 


Delayed motor development 


+ 


+ 


7 


+ 


+ 


Delayed speech and language development 


+ 


+ 


? 


+ 


+ 


Intellectual disability 






7 


Moderate 


Mild 


Required special educational support 


+ 


+ 


7 


+ 




General health 


Good 


Good 


Good 


Reasonable 


Reasonable 


Other features 








anal fissure 


macrocephaiy, prematurity, hyperreflexia 


FRA2A expression in peripheral blood 
lymphocytes 


26% 


40% 


31% 


40% 


26% 



doi:1 0.1 371 /journal.pgen.l 004242.t003 



that FRA2A carriers are haploinsufficient for AFF3, at least in a 
subset of tissues. A confident assignment of causality to the 
association of j4^F5-associated repeat expansion mutations with 
neurodevelopmental anomalies is confounded by the rarity of the 
fragile site and the strong bias in clinical ascertainment. It is, 
however, intriguing that delay in motor and language develop- 
ment appears to be a common feature in the individuals presented 
here and this may represent a true delay in development rather 
than a fixed disability. AFF3 deficiency may then be involved in 
the speed of skill acquisition without impairing the developmental 
capacity. 

A de novo microdeletion of 500 kb on chromosome 2qll.2 
removing only AFF3 [50] has been reported in a girl with a severe 
multisystem disorder consisting of a mesomelic skeletal dysplasia 
(fibular agenesis, abnormal and triangular tibiae, short neck), 
urogenital tract malformations, delayed psychomotor development 
and recurrent apnoea leading to respiratory arrest at the age of 4 
months. This clinical presentation is clearly very different to those 
associated with FRA2A but would be consistent with the 
expression pattern we demonstrate in mouse embryos. The 
clinical differences may be explained by the fact that the 
methylation of the repeat and thus the inactivation of the AFF3 
gene presumably takes place several weeks after fertilization, so 
that development during the first weeks is not affected [51]. It is 
also plausible that the transcriptional silencing associated with 
FRA2A may by tissue specific given that the alternative promoter 
that is immediately adjacent to the expansion mutation shows 
evolutionarily-conserved tissue-specific activity, and appears to be 
the main driver oiAFF3/Aff3 transcription in the brain in humans 
and mouse. 

Both rare and common fragile sites often co-localize with 
evolutionary breakpoints as was postulated previously by Ruiz- 
Herrera et al. [52,53]. We have shown through FISH and 
BLAST-analysis that the region close to the AFF3 repeat is indeed 
involved in a chromosomal rearrangement including a duplication 
and inversion of a 24 kb sequence from 2ql3 to 2qll.2 followed 
by an ancestral head-to-head chromosomal fusion that led at 2ql3 
that led to the formation of human chromosome 2. The 2qll.2 
breakpoint of this rearrangement falls within base pairs of the 
repeat and is also present in other primates. The 2ql3 region also 
co-localizes with FRA2B, an as yet to be characterized rare fragile 
site of the same type. 



In conclusion, we report a CGG repeat expansion mutation as 
the molecular cause of the fragile site FRA2A. FRA2A expression 
is associated with methylation of an AFF3 promoter and apparent 
transcriptional silencing of AFF3. It is currently difficult to 
unequivocally link FRA2A to a specific neurodevelopmental 
phenotype but it is plausible that haploinsufficiency for AFF3 in 
the developing brain is related to a true developmental delay and 
possibly mild intellectual disability. 

Materials and Methods 

Ethics statement 

The ethics committees of the participating study centers 
approved the study protocol and all participants gave their written 
informed consent. The study was in accordance with the principles 
of the current version of the Declaration of Helsinki. The fetal 
brain tissue was collected with informed written consent and 
ethical approval by Southampton and South West Hants LREC. 
The fetal tissue was obtained following surgical termination of 
pregnancy and staged according to the Carnegie Classification 
[54,55]. 

Family description: Clinical diagnosis and chromosome 
analysis 

Family A. The proband (AII.4, Figure 1) was originally 
investigated at the age of 7.5-years for mild to moderate learning 
disability and enuresis. She was born at term following an 
uneventful pregnancy. There were no neonatal problems but her 
early motor and language development was reported to be slow. 
Her general health was good and her weight was on the 25* 
centUe for her age. She attended a school for children with special 
educational needs. When last seen at the age of 20-years she was 
healthy and was working as a checkout operative in a high-street 
store and had no evidence of a significant functional neurocog- 
nitive deficit. Her elder sister (AIL 3) had been investigated several 
years earlier for moderate global learning disability and had 
attended the same a school for children with special educational 
needs. Again she displayed no evidence of significant cognitive 
impairment when seen at the age of 26-years. Indeed she was very 
much valued in the workplace for remembering numerical codes 
for almost the entire stock inventory. AIL 3 has a healthy daughter 
(AIII.l), born after an uneventful pregnancy. Evaluation of AIII.l 
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at the age of 3 weeks revealed no phenotypic abnormEility. No 
intellectual problems were apparent in any other relatives of the 
three-generation family tree. DNA analysis of the FRAXA locus 
was normal and permission to use the remaining peripheral blood 
DNA for research purposes was obtained from patient All. 4 
(Figure 1). An Epstein-Barr-transformed lymphoblastoid cell line 
was estabhshed at ECACC (Cambridge, UK) from a peripheral 
blood sample of proband AII.4. The index patient showed FRA2A 
expression in 40% of the examined cells. At the age of 20-years, 
chromosome analysis was repeated and confirmed FRA2A 
expression, this time in 2 1 % of the examined cells. Subjects 
AI1.3 and Alll.l expressed the fragile site FRA2A in respectively 
26% and 3 1 % of the examined cells. The unaffected father 
(subject AI. 1) showed FRA2A expression in 4% of his cells, 
whereas in two unaffected siblings and the mother no indications 
were found for FRA2A expression (Figure 1). 

FamUy B. The proband (11.1, Figure 1) was born at 39 weeks 
after an uneventful pregnancy. He was the third child of a non- 
consanguineous Caucasian couple. There were no problems in the 
neonatal period. At 8 weeks of age, an anal fistula was diagnosed 
and required three surgical procedures between 5 and 8 months 
for cure. He developed mild asthma at 10 months of age. Food 
allergies were demonstrated at 12 months of age and were 
managed by dietary modification. His height and head circum- 
ference were within the normal range, but his development was 
slow. He had mild dysmorphic features, including telecanthus, 
slightly short palpebral fissures, smooth philtrum, thin upper lip 
and a small mouth. Psychological assessment at the age of 1 2 using 
the WISC-IV showed a full-scale IQ, in the range 40-,52 (<0.1 
centUe), within the moderate range of intellectual disability. The 
Vineland Adaptive Behaviour Scales 1 1 gave an Adaptive 
Behaviour Composite in the range 55-67, below the first centile 
and in the low range. His parents and two elder brothers had 
normal intelligence and were not dysmorphic. Routine chromo- 
somes, subtelomere FISH of chromosomes, molecular testing for 
fragile X syndrome and urinary amino acids/organic acids/ 
mucopolysaccharides gave normal results. The proband expressed 
the fragile site FRA2A in 40% of the examined cells. Fragile site 
expression was not examined in the other family members. 

Family C. The proband (11.1, see Figure 1) was born at 33 
weeks after an uneventful pregnancy. She had respiratory distress 
syndrome and was treated with surfactant and ventilated for five 
days. There were no significant complications in the neonatal 
period. Her early development was delayed and she suffered from 
intermittent asthma and requir(^d \'entilator tubes for serious otitis 
media. At three years of age, she was seen because of speech delay 
— her expressive language was estimated to be at the 2 year level 
while receptive language and general development were assessed 
at the 2.5-3.0 year level. She was noted to be macrocephalic, 
hypotonic and hyperreflexic. Height was normal. At 1 3 years she is 
in an age appropriate class in high school and undertakes all 
subjects, except mathematics, with her peers, but struggles to keep 
up. She has poor concentration, is easily distractible and does not 
like changes in routine or to immediate expectations. She is doing 
well socially. The following investigations gave normal results: CT 
brain scan, urine amino acids and organic acids, chromosomes, 
creatine kinase and thyroid function tests. The proband showed 
FRA2A expression in 26% of the examined cells. Fragile site 
expression was not examined in the other family members. 

Fluorescent in situ hybridisation (FISH) mapping 

Peripheral blood lymphocyte-derived metaphase chromosome 
preparations from individual All. 3 were obtained using standard 
methods. An AFi^?-containing BAC-clone from the RPCI library. 



RP11-549H5 (AC092667), and clones mapping centromeric 
(RP11-436F6, AC010736) and telomeric (RP11-506F3, 
AC074387) to AFF3 were obtained from the BACPAC Resource 
Center (Oakland, California, USA). Long Range-PCR (LR-PCR) 
was used to generate probes of 10 kb and 18 kb situated 
respectively immediately 5' and 3' to the promoter region of 
AFF3. The following primer pairs were used: LIOK (forward 5'- 
TGCAGGAATGAATGAAGGGCAAGCAA-3' and reverse 5'- 
TGGCCTCTGGGTGTCGACTTCAAACT-3') and L18K (for- 
ward 5'-ACAGTTTGGCTTGACCGGGAGGGTTT-3' and 
reverse 5'- TCAAAAATGTTCCCTTGCCCACAGTGC-3'). 
LR-PCR was performed using the Expand Long Template 
PCR System (Roche, Basel, Switzerland) according to the 
manufacturer's instructions. The amount of BAG DNA used 
per reaction was 5-10 ng. All probes were labelled with 
digoxigenin- 1 1 -dUTP or biotin- 1 6-dUTP (Roche, Indianapolis, 
IN) by nick translation. DNA hybridisation and antibody 
detection were carried out as described previously [56]. At least 
five metaphases were analysed for each hybridisation, using a 
Zeiss Axioplan 2 fluorescence microscope equipped with a triple 
band-pass filter (#83000 for DAPI, FITC and Texas Red; 
Chroma Technology, Bratdeboro, VT). Images were collected 
using a cooled CCD camera (Princeton Instruments Pentamax, 
Roper Scientific, Trenton, INf) and analysed using IPLab software 
(Scanalytics, Vienna, VA). 

PCR amplification and hybridization of the FRA2A- 
associated CGG repeat 

PCR amplification of the normal FRA2A CGG repeat was 
performed with the aid of 2.5 x PCR Enhancer solution 
(Invitrogen, Carlsbad, CA, USA) using a forward primer PI (5'- 
GGCCGTAAAAGCCACGAGAGAGGG-3') and a reverse 
primer P2 (5'-CTTGCGC.GCAGGCACACTCAAGAG-3') de- 
rived from the sequences flanking the repeat. PCR products were 
sequenced and subsequentiy analysed by use of an ABl Prism 3130 
Genetic Analyzer (Applied Biosystems, Foster City, CA, USA). 

A Southern blot was created by digesting 10 |J,g of DNA, 
extracted from blood or Epstein-Barr-virus transformed cells, with 
the restriction enzymes Hindlll and Mcol (Fermentas GmbH, St. 
Leon-Rot, Germany) in separate reactions. The digested DNA was 
then separated by electrophoresis on a 0.7% agarose gel. No 
ethidium bromide was added during this electrophoresis step to 
avoid product-related smearing on the gel that would cause 
overestimation of the mosaicism of repeat sizes [57]. After 
subsequent denaturation and neutralisation the DNA fragments 
were transferred to Hybond N^ membranes. Hybridisation was 
performed at 65°C using a specific 32P-labeled 992 bp PCR probe 
(forward primer 5'- AGCCTTTGTTCCTGGGAATGCT- 
GTCTCAAT -3' and reverse primer 5'- GGAAAGGCAGGT- 
GATCAGCTAGA.\GGGTG -3'). 

Repeat primed PCR was performed to interrogate the number 
of CGG repeats in the AFF3 gene with Asuragens CGG Repeat 
Primed PCR system designed for detection of Fragile X expanded 
alleles. Triplet repeat primed PCR (TP-PCR) uses a locus-specific 
forward primer that flanks the repeat. The reverse PCR primer of 
the primer pair is designed to hybridise within the CGG repeat 
region as it contains a (GCC)5 tail. This generates amplicons of 
various sizes as the reverse primers bind to multiple locations 
during TP-PCR. As the number of CGG repeats increases, a 
characteristic ladder profile appears on the fluorescence electro- 
pherogram enabling the rapid and inexpensive identification of 
expanded repeats that may have been missed using current PCR 
methods. Samples were PCR-amplified by preparing a master 
mix containing 11.45^ll of GC-rich AMP buffer, 0.25 (ti of 
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FAM-labeUed AFF3 forward primer (S'-GGCCGT.^AAAGC- 
CACGAGAGAGGG-3'), 0.25 nl of AFF3 reverse primer (5'- 
CTTGCGCGCAGGCACACTCAAGAG-3'), 0.5 \il of CGG 
primer (5'- TACGCATCCCAGTTTGAGACGGCCGCCG- 
CCGCCGCG-3'), 0.5 |xl of nuclease-free water, and 0.05 |J,1 of 
GC-rich polymera.se mix from Asuragen (Austin, TX). 2 |J.l of the 
DNA sample, typically at 20 ng/^il, was added before transferring 
the plate to a thermal cycler (9700, Applied Biosystems, Foster 
City, CA). Samples were amplified with an initial heat denatur- 
ation step of 95°C for 5 minutes, followed by 10 cycles of 97°C for 
35 seconds, 62°C for 35 seconds, and 68°C for 4 minutes, and 
then 20 cycles of 97°C for 35 seconds, 62°C for 35 seconds, and 
68°C for 4 minutes with a 20 second autoextension at each cycle. 
The final extension step was 72°C for 10 minutes. After PGR, 2 [tl 
of the PGR product was added to a mix with 1 1 |il HDF and 2 ^ll 
Rox 1000 standard. After a brief denaturation step, samples were 
analysed using the ABI Prism 3 1 30 Genetic Analyzer. 

Methylation analysis of the AFF3 promoter 

The methylation status of the AFF3 associated CGG repeat and 
the surrounding region, was analysed by bisulfite sequencing. 
Genomic DNA collected from lymphoblastoid cell lines and saliva 
(subject AIII. 1) was bisulfite-treated using the EpiTect Bisulfite Kit 
(Qiagen, Venlo, Netherlands) according to manufacturer's guide- 
lines. Bisulfite treatment converts all non-methylated cytosines into 
thymines, while methylated cytosines remain unchanged. Primers 
specific for the methylated bisulfite converted DNA (forward 5'- 
GGTGAGAAATAAAAAGAAAGGAG -3' and reverse 5'- CGT- 
CAACAAGCCTAAAATAGG -3') were designed. After PGR 
amplification, the CGG surrounding area (chr2: 10072 1494- 
100721911; hgl9) was sequenced using an ABI Prism 3130 
DNA sequencer. Moreover we have performed pyrosequencing 
using the AFF3_002 PyroMark CpG assay according to manu- 
facturer's instructions (Qiagen, Venlo, Netherlands) and analysed 
the results on a PyroMark Q_24. Methylation cut-olf value was set 
at 10%. 

Gene-expression analysis 

The expression pattern of AFF3 in human tissues was studied 
using a multiple-tissue Northern blot (FirstChoice Northern 
Human Blot I, Ambion, Austin, TX, USA). The specific AFF3 
probe was a 655-bp PGR product [forward primer 5'-TATC- 
GAGTGTGGAAATGCAA-3' and reverse primer 5'-TGA- 
GGTCCCTATGACAGGTG-3'] and radiolabelled by tiie addi- 
tion of 32P-dATP and 32P-dCTP (MP, Irvine, California, USA). 
Hybridisation was performed according to the manufacturer's 
instructions. 

Total RNA-sequencing data from the Illumina Human Body 
Map 2.0 project (GSE3061 1) was obtained from tiie NCBI Gene 
Expression Omnibus. Data from brain (ERR030882, female, 77y), 
ovary (ERR030874, female, 47yj) and lymph node (ERR030878, 
female, 86y) was downloaded in the form of 2 x50 bp reads and 
imported into CLC genomics workbench v6.01. Transcriptomics 
analysis was performed within CLC Genomics using the human 
reference genome version hgl9. Default settings were used, apart 
from a smaller expected insert size of 50 bp. Additionally total 
RNA was isolated from the human fetal brain tissues (FBI 54 
gestational days (GD), FB2 47 GD, FB3 59 GD) according to the 
Trizol (Invitrogen) protocol. The preparation of amplicon libraries 
and RNA-Seq analysis were performed following standard 
Illumina TruSeq protocols and reads of length 50 bp were 
produced on the Illumina GAIIx platform. The fetal tissue was 
obtained following surgical termination of pregnancy and 
staged according to the Carnegie Classification [54,55]. CAGE 



tag data was obtained from the 1 A.\TOM4 consortium as both 
pre-defined CAGE tag clusters (http://fantom.gsc.rikenjp/ 
download/Tables/human/ CAGE/ promoters/ tag_clusters/ and 
http:/ / fantom.gsc.riken jp/ 4/ download/Tables/ mouse/ CAGE/ 
promoters/tag_clusters/) and as genome aligned individual tags 
(http://fantom.gsc.rikenjp/4/download//Tables/human/CAGE/ 
mapping/). Coordinates were converted from the hglB reference 
genome assembly to hgl9 using LiftOver (http://genome.ucsc. 
edu/cgi-bin/hgLiftOver). Statistical analysis was performed R 
(http://www.R-project.org/; version 3.0.0). 

The AFF3 transcript NM_002285 (http://www.ncbi.nlm. 
nih.gov/) was searched for cSNP's. The cSNP's in FRA2A 
patients were tested by PGR followed by sequencing at the 
genomic and at the cDNA level. RNA was isolated from 
Epstein-Barr-transformed lymphoblastoid cells using Trizol 
(Invitrogen, Carlsbad, CA, USA) and converted to cDNA using 
Superscript III reverse transcriptase (Invitrogen, Carlsbad, CA, 
USA). The following primer sets were used: SNPl (rs4851214) 
in exon 14 (forward primer 5'- AGTGATGAAGAGGA- 
GAATGAACA -3' and reverse primer 5'- ATAG- 
GAGGCTTGTGGGGATTA -3') and SNP2 (rsl3427251) in 
exon 25 (forward primer 5'-GTGTGTCTGGTATGTTTA- 
CAC-3' and reverse primer 5'-GGATCAGCATCTAGTC- 
TAAG-3'). Sequencing products were analysed on an ABI 
3130 Prism automatic sequencer. 

The Aff3 riboprobe for whole mount in situ hybridisation to 
mouse embryos was generated by in vitro transcription from a PGR 
template amplified from the 3'UTR using mouse genomic 
DNA as a template. T3 (for sense probe) and T7 (for antisense 
probe) binding sites were added to the forward {5'- AA'l lAACCCT- 
G4CrA4^GGC TCTCCAACCGGATCCAGAAT-3') and reverse 
(5'- rA4raCG4CrC4Cr^7::4GGA GCCCATGGCACCTCTCT- 
3') primers. The ^^'ISH protocol and OPT scanning was 
performed exactly as previously described [58]. 

Mutation analysis of the AFF3 gene and marker analysis 

An coding exons of the AFF3 gene were PGR amplified at the 

genomic level using standard protocols for all patients and relatives 
to exclude the presence of any other disease-causing mutation. 
PGR products were enzymatically purified and sequenced. 
Sequences were analysed with an ABI Prism 3130 DNA 
sequencer. 

For the marker analysis, genomic DNA was isolated from 
peripheral blood from all available family members using standard 
procedures. Highly polymorphic microsatellite markers, D2S2209 
and D2S231 1, were selected from the Marshfield genetic map in 
the proximity of the repeat. These markers are both dinucleotides 
with an average heterozygosity of 7 1 % . Analysis was performed by 
a Go Tarj DNA polymerase mediated PGR, with fluorescentiy 
labeled primers. Fragment analysis of amplified products was 
performed using an ABI PRISM 3130 XL Genetic Analyzer 
(Applied Biosystems). Allele identification was done with Gene 
mapper v3.7 software (Applied Biosystems). 

Supporting Information 

Figure SI CGG-repeat region in AFF3 intron 2. Sequence of 
the CGG-repeat region in intron 2 of the AFF3 gene shown in 
telomeric-centrometic orientation. The CGG repeat is shown in 
bold blue text. The repeat lies within the 134 bp region that is 
deleted in subject AII.4 which is shown in bold black text. Forward 
and reverse primers used for the amplification of the CGG repeat 
are indicated with blue arrows. The primers used for bisulfite 
sequencing (chr2:100721494-10072191 1; hgl9) are indicated with 
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orange arrows and CpG sites that were analysed with bisulfite 

pyrosequencing are represented in bold orange text. 

(TIF) 

Figure S2 A: Fragment-length analysis of regular PGR and 
TP-PCR generated products of the CGG repeat in the AFF3 
gene of family A. Fluorescently-labeled PGR products of all 

individuals of family A were separat(;d by capillary c'lcctropho- 
resis on an ABI PRISM 3130 XL Genetic Analyzer. For every 
individual a PGR covering the entire repeat was analyzed in 
addition to a repeat primed PGR (Asuragen). Individual AI.l 
appeared homozygous for a repeat with 8 units as no fading 
repeat-signal is present after repeat-primed PGR (right corner). 
For individual AII.4 the 134 bp-deletion of the repeat and 
surrounding region is clearly detected in addition to a short 8- 
unit repeat. An expanded allele of over 300 units is present in 
this individual as shown with repeat primed PGR. This 
expanded allele could not be covered by regular PGR covering 
the entire repeat. In individuals AII.3 and AIIl.l a normal 
range repeat of 8 and 18 repeated units respectively was 
detected in addition to an expanded allele containing over 300 
units. B: Fragment-length analysis of regular PGR and TP-PGR 
generated products of the GGG repeat in the AFF3 gene of 
family B. In individual BI.2 one normal range allele with 15 
repeated units was identified. In addition, a second slightly 
expanded allele of about 120 repeated units was detected by 
regular PGR covering the repeat. This expansion was confirmed 
with repeat primed PGR. In individual BII.l a normal range 
repeat of 8 was detected in addition to an expanded allele 
containing over 300 units. The trace labelled FR_blanco 
represents a blanc reference lane. C: Fragment-length analysis 
of regular PGR and TP-PGR generated products of the GGG 
repeat in the AFF3 gene of family G. The father of family G, 
GI. 1, is heterozygous for the number of repeated units, 
displaying two alleles with respectively 5 and 8 repeated units. 
In the mother, GI.2, a second slightly expanded allele with 106 
repeated units was detected in addition to a normal range allele 
with 17 GGG-units by regular PGR. This expansion was 
confirmed with repeat primed PGR. In individual GII.l a 
normal range repeat of 8 was detected in addition to an 
expanded allele containing over 300 units. For each individual 
of this family the raw analysis data of the genetic analyzer are 
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